System-Level Versus User-Defined Checkpointing
نویسندگان
چکیده
Checkpointing and rollback recovery is a very effective technique to tolerate transient faults and preventive shutdowns. In the past, most of the checkpointing schemes published in the literature were supposed to be transparent to the application programmer and implemented at the operating-system level. In the recent years, there has been some work on higher-level forms of checkpointing. In this second approach, the user is responsible for the checkpoint placement and is required to specify the checkpoint contents. In this paper, we compare the two approaches: systemlevel and user-defined checkpointing. We discuss the pros and cons of both approaches and we present an experimental study that was conducted on a commercial parallel machine.
منابع مشابه
Speculative Checkpointing
In large scale parallel systems, storing memory images with checkpointing will involve massive amounts of concentrated I/O from many nodes, resulting in considerable execution overhead. For user-level checkpointing, overhead reduction usually involves both spatial, i.e., reducing the amount of checkpoint data, and temporal, i.e., spreading out I/O by checkpointing data as soon as their values b...
متن کاملTransparent Orthogonal Checkpointing through User-Level Pagers
Orthogonal persistence opens up the possibility for a number of applications. We present an approach for easily enabling transparent orthogonal persistence, basically on top of a modern μ-kernel. Not only are all data objects made persistent. Threads and tasks are also treated as normal data objects, making the threads and tasks persistent between system restarts. As such, the system is fault s...
متن کاملUser-level Checkpointing Through Exportable Kernel State
Checkpointing, process migration, and similar services need to have access not only to the memory of the constituent processes, but also to the complete state of all kernel provided objects (e.g., threads and ports) involved. Traditionally, a major stumbling block in these operations is acquiring and re-creating the state in the operating system. We have implemented a transparent user-mode chec...
متن کاملDMTCP: Scalable User-Level Transparent Checkpointing for Cluster Computations
As the size of clusters increases, failures are becoming increasingly frequent. Applications must become fault tolerant if they are to run for extended periods of time. We present DMTCP (Distributed MultiThreaded CheckPointing), the first user-level distributed checkpointing package not dependent on a specific message passing library. This contrasts with existing approaches either specific to l...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کامل